library(reticulate)
::use_virtualenv("r_python_worksessions") reticulate
Visualizations with R and Python
Reading data into R/Python
Below are the ways that we will be reading in our data from the /data directory with both R and python. For R users, we will use the readr::
and readxl::
packages while python users will read data with pandas
. Let’s view the examples below.
For R we will need to initialize a few packages.
library(tidyverse)
library(readxl)
library(here)
<- readr::read_csv(here::here("data", "ca_np.csv"))
ca_np <- readxl::read_xlsx(here::here("data", "ci_np.xlsx"))
ci_np
glimpse(ca_np)
Rows: 789
Columns: 7
$ region <chr> "PW", "PW", "PW", "PW", "PW", "PW", "PW", "PW", "PW", "PW", …
$ state <chr> "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", …
$ code <chr> "CHIS", "CHIS", "CHIS", "CHIS", "CHIS", "CHIS", "CHIS", "CHI…
$ park_name <chr> "Channel Islands National Park", "Channel Islands National P…
$ type <chr> "National Park", "National Park", "National Park", "National…
$ visitors <dbl> 1200, 1500, 1600, 300, 15700, 31000, 33100, 32000, 24400, 31…
$ year <dbl> 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, …
For python we will need to import pandas and Path from pathlib. Path will behave similarly to the here function that the R book uses.
import pandas as pd
from pathlib import Path
= pd.read_csv(Path("../../data/ca_np.csv"))
ca_np = pd.read_excel(Path("../../data/ci_np.xlsx"), engine = 'openpyxl')
ci_np
ca_np.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 789 entries, 0 to 788
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 region 789 non-null object
1 state 789 non-null object
2 code 789 non-null object
3 park_name 789 non-null object
4 type 789 non-null object
5 visitors 789 non-null int64
6 year 789 non-null int64
dtypes: int64(2), object(5)
memory usage: 43.3+ KB
Creating visuals: Visitors to Channel Islands NP
Below we will create our first basic visual using both R and python. With the R code we will only use ggplot2
to create this visual. For python, we will use plotnine
which uses R’s ggplot2
syntax.
For R we will need to initialize a few packages.
::ggplot(data = ci_np, aes(x = year, y = visitors)) +
ggplot2geom_line()
import matplotlib.pyplot as plt
from plotnine import ggplot, aes, geom_line
'agg')
plt.switch_backend(
# Creating the plot
= (
plot =ci_np, mapping=aes(x='year', y='visitors'))
ggplot(data+ geom_line()
)
# Displaying the plot
print(plot)
<string>:3: FutureWarning: Using print(plot) to draw and show the plot figure is deprecated and will be removed in a future version. Use plot.show().
Now that we have a few basic plots created across each package/library, let’s create a base plot object and create variations off of it. This will be done similar to the R for Excel
<- ggplot(data = ci_np, aes(x = year, y = visitors))
gg_base_r
+
gg_base_r geom_point()
from plotnine import ggplot, aes, geom_point
= ggplot(data=ci_np, mapping=aes(x='year', y='visitors'))
gg_base_py + geom_point() gg_base_py
<string>:1: FutureWarning: Using repr(plot) to draw and show the plot figure is deprecated and will be removed in a future version. Use plot.show().
<Figure Size: (640 x 480)>